Your browser doesn't support javascript.
Show: 20 | 50 | 100
Results 1 - 4 de 4
Filter
1.
Gigascience ; 122022 12 28.
Article in English | MEDLINE | ID: covidwho-20242676

ABSTRACT

BACKGROUND: Literature about SARS-CoV-2 widely discusses the effects of variations that have spread in the past 3 years. Such information is dispersed in the texts of several research articles, hindering the possibility of practically integrating it with related datasets (e.g., millions of SARS-CoV-2 sequences available to the community). We aim to fill this gap, by mining literature abstracts to extract-for each variant/mutation-its related effects (in epidemiological, immunological, clinical, or viral kinetics terms) with labeled higher/lower levels in relation to the nonmutated virus. RESULTS: The proposed framework comprises (i) the provisioning of abstracts from a COVID-19-related big data corpus (CORD-19) and (ii) the identification of mutation/variant effects in abstracts using a GPT2-based prediction model. The above techniques enable the prediction of mutations/variants with their effects and levels in 2 distinct scenarios: (i) the batch annotation of the most relevant CORD-19 abstracts and (ii) the on-demand annotation of any user-selected CORD-19 abstract through the CoVEffect web application (http://gmql.eu/coveffect), which assists expert users with semiautomated data labeling. On the interface, users can inspect the predictions and correct them; user inputs can then extend the training dataset used by the prediction model. Our prototype model was trained through a carefully designed process, using a minimal and highly diversified pool of samples. CONCLUSIONS: The CoVEffect interface serves for the assisted annotation of abstracts, allowing the download of curated datasets for further use in data integration or analysis pipelines. The overall framework can be adapted to resolve similar unstructured-to-structured text translation tasks, which are typical of biomedical domains.


Subject(s)
COVID-19 , Deep Learning , Humans , SARS-CoV-2/genetics , COVID-19/genetics , Mutation , Kinetics
2.
5th International Conference on Intelligent Computing in Data Sciences, ICDS 2021 ; 2021.
Article in English | Scopus | ID: covidwho-1672722

ABSTRACT

Science has time and again proven to be one of the most powerful tools in finding solutions to the problems faced by the world. Let it be natural or man-made challenges, hard work put into finding efficient answers to tackle them has proven to safeguard the ecosystem. Sometimes the research community is put under pressure when humanity faces the challenge of survival like the Covid-19 pandemic. A great extent of published works needs to be studied to find an optimal solution to existing or new queries related to the virus. In this research work, we build an efficient data mining tool using the CORD-19 Dataset to help the community come up with answers to Covid-19 related questions. We use a combination of semantic and keyword search to reduce the solution space of our model. Our model makes use of parallelism, paraphrasing, and state-of-the-art natural language processing techniques which will serve as a time and energy-saving tool for the information need of all doctors and researchers who are trying to put an end to the pandemic and avoid future possible outbreaks. © 2021 IEEE.

3.
J Med Libr Assoc ; 109(3): 395-405, 2021 Jul 01.
Article in English | MEDLINE | ID: covidwho-1463959

ABSTRACT

OBJECTIVE: We analyzed the COVID-19 Open Research Dataset (CORD-19) to understand leading research institutions, collaborations among institutions, major publication venues, key research concepts, and topics covered by pandemic-related research. METHODS: We conducted a descriptive analysis of authors' institutions and relationships, automatic content extraction of key words and phrases from titles and abstracts, and topic modeling and evolution. Data visualization techniques were applied to present the results of the analysis. RESULTS: We found that leading research institutions on COVID-19 included the Chinese Academy of Sciences, the US National Institutes of Health, and the University of California. Research studies mostly involved collaboration among different institutions at national and international levels. In addition to bioRxiv, major publication venues included journals such as The BMJ, PLOS One, Journal of Virology, and The Lancet. Key research concepts included the coronavirus, acute respiratory impairments, health care, and social distancing. The ten most popular topics were identified through topic modeling and included human metapneumovirus and livestock, clinical outcomes of severe patients, and risk factors for higher mortality rate. CONCLUSION: Data analytics is a powerful approach for quickly processing and understanding large-scale datasets like CORD-19. This approach could help medical librarians, researchers, and the public understand important characteristics of COVID-19 research and could be applied to the analysis of other large datasets.


Subject(s)
Academies and Institutes/statistics & numerical data , Biomedical Research/statistics & numerical data , COVID-19/diagnosis , COVID-19/physiopathology , COVID-19/therapy , Periodicals as Topic/statistics & numerical data , Research Report , Bibliometrics , China , Humans , SARS-CoV-2 , United States
4.
J Indian Inst Sci ; 100(4): 725-731, 2020.
Article in English | MEDLINE | ID: covidwho-1235804

ABSTRACT

This short paper describes a web resource-the NIST CORD-19 Web Resource-for community explorations of the COVID-19 Open Research Dataset (CORD-19). The tools for exploration in the web resource make use of the NIST-developed Root- and Rule-based method, which exploits underlying linguistic structures to create terms that represent phrases in a corpus. The method allows for auto-suggesting-related terms to discover terms to refine the search of a COVID-19 heterogenous document base. The method also produces taxonomic structures in the target domain as well as providing semantic information about the relationships between terms. This term structure can serve as a basis for creating topic modeling and trend analysis tools. In this paper, we describe use of a novel search engine to demonstrate some of the capabilities above.

SELECTION OF CITATIONS
SEARCH DETAIL